Goto

Collaborating Authors

 theorem 4



A Organization of the Appendix 482 The appendix includes the missing proofs, detailed discussions of some argument in the main body

Neural Information Processing Systems

The proof of infeasibility condition (Theorem 3.2) is provided in Section B. Explanations on conditions derived in Theorem 3.2 are included in Section C. The proof of properties of the proposed model (r)LogSpecT (Proposition 3.4 The truncated Hausdorff distance based proof details of Theorem 4.1 and Corollary 4.4 are Details of L-ADMM and its convergence analysis are in Section F. Additional experiments and discussions on synthetic data are included in Section G. ( m 1) Again, from Farkas' lemma, this implies that the following linear system does not have a solution: Example 3.1 we know δ = 2|h Since the constraint set S is a cone, it follows that for all γ > 0, γ S = S . Opt(C, α) = α Opt(C, 1), which completes the proof. The proof will be conducted by constructing a feasible solution for rLogSpecT. Since the LogSpecT is a convex problem and Slater's condition holds, the KKT conditions We show that it is feasible for rLogSpecT. R, its epigraph is defined as epi f: = {( x, y) | y f ( x) }. Before presenting the proof, we first introduce the following lemma.






A Detailed Proof 1 A.1 Proof of Theorem 4.1

Neural Information Processing Systems

We can compute the fixed point of the recursion in Equation A.2 and get the following estimated Then we compare these two gaps. To utilize the Eq. 4 for policy optimization, following the analysis in the Section 3.2 in Kumar et al. By choosing different regularizer, there are a variety of instances within CQL family. B.36 called CFCQL( H) which is the update rule we used: In discrete action space, we train a three-level MLP network with MLE loss. In continuous action space, we use the method of explicit estimation of behavior density in Wu et al.